For this project, we explored two datasets – tweets about the Covid-19 vaccine and the reported side effects of the vaccine.
Firstly, we looked into what people are discussing when they tweet about the Covid-19 vaccine and explored their feelings about it. This would give us some insight into people’s general attitudes towards it.
Secondly, we focused on the adverse reactions reported from 2020-12-01 to 2021-3-31. The visualization aims to provide an insight into who are the people reporting side effects and how do they compare to the general population; the most common reported symptoms etc.
Altogether, our project aims to provide insights to guide more appropriate actions in promoting Covid-19 vaccination, as well as more effective actions in informing and relieving side effects.
load("./data/term_stemmed_all.RData")
This word cloud includes the popular keywords (appeared more than 600 times) used in tweeting about Covid-19 vaccine.
The most commonly mentioned word is of course “vaccine”, followed by “moderna” and “covid”, while “pfizer” and “pfizerbiontech” are much smaller.
We can also see some common keywords seemingly describing experiences “dose”,“receive”,“today”, suggesting many of these tweets may be recording people’s vaccination experiences.
There’s some discussion about China (“china” and “chinese”), because Chinese also manufacture and hand out Covid-19 vaccine. It’s worth noting that “sore” and “side” also appears a lot, so maybe a couple of people suffering side effects.
There are four major clusters detected – one is the major group with two central points – “vaccine” and “covid”; another is one around “moderna” the manufacturer, which may come from tweets reporting new progresses of moderna vaccine; another one is more dispersed with three centers – “today”, russia” and “antario”, which may come from those focus on vaccine exportation news;the other one at the intercept is more disperse and doesn’t have a central term.
The clusters are interwoven together, but can offer some hins on different popular topics. Readers can freely explore the network and look for the relevant words they are interested in.
We clean the text data of all the tweets about vaccines, and then we apply vader sentiment analysis, so we classify all tweets into three categories: positive tweets, neutral tweets and negative tweets.
Now we can see people’s attitude towards vaccines.
We can find from the bar chart that most tweets about vaccines are neutral one or positive one. Negative sentiment is not widely available.
Trends over time of numbers of tweets posted of three sentiment types are similar, maybe because there is no special events affects people’s attitude.
We select top 50 common words for each types of tweets.
After seeing the common words of positive, neutral and negative tweets, we find people share their happiness about the arrival of vaccines and give positive feedback after receiving a shot in positive tweets; neutral tweets are just objective statements of vaccines news or information; people worry about the side effect of vaccines and whether vaccines will work in negative tweets.
We can see sentiment attribute of popular tweets(base on favorited times). Most of top 15 popular tweets are neutral one or positive one, which means that people didn’t show a preference for negative tweets.
We can see sentiment attribute of popular tweets(base on retweeted times). Most of top 15 popular tweets still are neutral one or positive one, which means that people didn’t show a preference for retweeting negative tweets and maybe kept positive attitude towards effect of vaccines.
We also check tweets of popular users(base on number of followers they have), because they have great influence among the public. Most of these users mainly post objective statements of vaccines news or information and they show more positive sentiment than negative sentiment.
packages <- c("devtools","knitr","tidyverse","widgetframe","readr",
"wordcloud", "base64enc", "tidytext",
"RWeka","stats","manifestoR","readtext",
"rvest", "stringr",
"SnowballC", "plotrix", "tidyr", "tidytext", "stats",
"dendextend", "ggthemes",
"httr","jsonlite", "DT", "textdata", "ggmap","maptools","mapproj","rgeos","rgdal",
"RColorBrewer", "stringr","scales", "leaflet", 'leafpop', "ggthemes", "ggtext", "wordcloud")
packages <- lapply(packages, FUN = function(x) {
if(!require(x, character.only = TRUE)) {
install.packages(x)
library(x, character.only = TRUE)
}
}
)
We could see that in general, women and younger people seem to suffer more from side effects. As age increased, the report number actually decreased, especially for women. Do elders suffer more from side effects? Not exactly. Is it possible there are fewer elders who got vaccinated thus fewer reports? We decided to dive deeper into who got vaccinated by looking at different age groups.
By mar 31, 2021, more than 75% of people over 65 had received at least one dose of covid-19 vaccine; a much higher percentage than younger group.
By mar31,2021, for people over 75 years old, fewer than 3 of every 10,000 people who received at least one dose of COVID-19, reported adverse effects. In contrast, those aged 30 to 39 reported the highest rate of adverse effects, at 6 out of every 10, 000 people who received the vaccine. So, counter-intuitively, it seems that elders are less vulnerable to side effects. Some articles suggest that the immune response may actually be stronger in the younger group, so side effects may be more pronounced for the younger. The results of our data seem to confirm this. However, there may be other reasons for the lower rate of reported side effects in the elderly group, such as the elderly group is not as likely to use computers as the younger group, which affects the reporting rate, etc
Medical history and Pre-illness are also strong indicators to predict suitable candidates for covid-19 vaccines. So we decided to look at the most common pre-illness of these people who reported adverse reactions.
Hypertension, asthma, and diabetes are the most common pre-illness mentioned by people reporting side effects. One thought is that respondents with those diseases might be more vulnerable to vaccine side effects. However, Is it possible that those symptoms reported are actually caused by pre-illness but rather than vaccines? People might associate health problems that would have happened anyway with the vaccines. We haven’t come up with a more accurate way to describe this relationship but it is worth exploring.
In general, the younger group seems to be more sensitive about side effects; they reported symptoms under 3 days after vaccination. Elders tend to be slower in feeling the symptoms. Side effects tend to kick in early for males in the younger group. In the elder group, on the contrary, females seem to suffer earlier from the symptoms.
## <<VCorpus>>
## Metadata: corpus specific: 0, document level (indexed): 0
## Content: documents: 30095
Negative, disgust, fear, sadness are the most frequently expressed emotions in the symptoms text, which is reasonable because people were experiencing uncomfortable feelings in their bodies.
The number of vaccine allocations in each state does not have a brand tendency. In every state, the number of vaccine allocations for the two brands is basically the same.
states.sp <- readOGR(dsn = "data/hldata/cb_2018_us_state_5m/cb_2018_us_state_5m.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\hjm\Desktop\TO DO\DVfinal\Group_L_VaccineSideeffect-main (3)\Group_L_VaccineSideeffect-main\data\hldata\cb_2018_us_state_5m\cb_2018_us_state_5m.shp", layer: "cb_2018_us_state_5m"
## with 56 features
## It has 9 fields
## Integer64 fields read as strings: ALAND AWATER
# shape file source: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
# leaflet
#merge to the shapefile
rate.sp <- states.sp
rate.sp@data<-rate.sp@data %>%
left_join(case_vac, by = c('NAME' = 'state1'))
sub.d<-subset(rate.sp, party == 'DEMOCRAT' )
sub.r<-subset(rate.sp, party == 'REPUBLICAN' )
#pop up content
rate.popup.content <- paste("State:",rate.sp@data$NAME ,"<br/>",
"Report rate:",round(rate.sp@data$rate,2),"per 100k vaccined" ,"<br/>",
"Party:",rate.sp@data$party ,"<br/>")
rate.popup.content.d <- paste("State:",sub.d@data$NAME ,"<br/>",
"Report rate:",round(sub.d@data$rate,2),"per 100k vaccined" ,"<br/>",
"Party:",sub.d@data$party ,"<br/>")
rate.popup.content.r <- paste("State:",sub.r@data$NAME ,"<br/>",
"Report rate:",round(sub.r@data$rate,2),"per 100k vaccined" ,"<br/>",
"Party:",sub.r@data$party ,"<br/>")
#
leaflet() %>%
addProviderTiles("OpenStreetMap.Mapnik") %>%
setView(lat = 37, lng = -95, zoom = 4) %>%
addPolygons(group="rate",
data =rate.sp,
stroke = TRUE,
smoothFactor = 0.5,
weight=1,
color = '#333333',
opacity=1,
fillColor = ~colorQuantile("Oranges", rate)(rate),
fillOpacity = 1,
popup = rate.popup.content) %>%
addPolygons(group="Democrat",
data=subset(rate.sp, party == 'DEMOCRAT' ),
popup = rate.popup.content.d,
opacity = 1.0, stroke = TRUE,
color = "blue", weight=1) %>%
addPolygons(group="Republican",
data=subset(rate.sp, party == 'REPUBLICAN' ),
popup = rate.popup.content.r,
opacity = 1.0, stroke = TRUE,
color = "red", weight=1) %>%
addLayersControl(overlayGroups = c("rate","Democrat","Republican"),
options = layersControlOptions(collapsed = FALSE))
The reporting rate of vaccine side effects in each state does not seem to be significantly related to the party’s victory in the 2020 election. But New York has the highest reporting rate, more than double that of Montana, the second highest.
# leaflet
#merge to the shapefile
cm.sp.p <- states.sp
cm.sp.p@data<-cm.sp.p@data %>%
left_join(subset(case_manu, manu == "pfizer"), by = c('NAME' = 'state1'))
cm.sp.m <- states.sp
cm.sp.m@data<-cm.sp.m@data %>%
left_join(subset(case_manu, manu == "moderna"), by = c('NAME' = 'state1'))
#pop up content
cm.p.popup.content <- paste("State:",cm.sp.p@data$NAME ,"<br/>",
"Report case:",cm.sp.p@data$case_manu ,"<br/>",
"Manufacturer:",cm.sp.p@data$manu ,"<br/>",
"Party:",cm.sp.p@data$party ,"<br/>")
cm.m.popup.content <- paste("State:",cm.sp.m@data$NAME ,"<br/>",
"Report case:",cm.sp.m@data$case_manu ,"<br/>",
"Manufacturer:",cm.sp.m@data$manu ,"<br/>",
"Party:",cm.sp.m@data$party ,"<br/>")
#
leaflet() %>%
addProviderTiles("OpenStreetMap.Mapnik") %>%
setView(lat = 37, lng = -95, zoom = 4) %>%
addPolygons(group="pfizer",
data =cm.sp.p,
stroke = TRUE,
smoothFactor = 0.5,
weight=1,
color = '#333333',
opacity=1,
fillColor = ~colorQuantile("Purples", case_manu)(case_manu),
fillOpacity = 0.5,
popup = cm.p.popup.content) %>%
addPolygons(group="moderna",
data =cm.sp.m,
stroke = TRUE,
smoothFactor = 0.5,
weight=1,
color = '#333333',
opacity=1,
fillColor = ~colorQuantile("Oranges", case_manu)(case_manu),
fillOpacity = 0.5,
popup = cm.m.popup.content) %>%
addLayersControl(overlayGroups = c("pfizer","moderna"),
options = layersControlOptions(collapsed = FALSE))
According to previous visulizations, there is no significant difference in the number of vaccine allocations between Moderna and Pfizer in each state. However, Pfizer’s vaccine has more reported cases of side effects.